QTM 385 - Experimental Methods

Lecture 20 - Mediation Analysis

Danilo Freire

Emory University

Hello again! 👋

Brief recap 📚

Last class: Interference

  • Interference occurs when one unit’s treatment status affects another unit’s outcome (violating SUTVA)
  • Can be a nuisance or the object of study (e.g., network effects, spillovers)
  • Designs to handle/study interference:
    • Clustered randomisation (limits interference between clusters)
    • Multi-level designs (e.g., randomise households, then individuals within)
    • Spatial spillover models (requires careful definition of proximity/influence, IPW for estimation)
  • More designs:
    • Within-subject / Repeated Measures (Units act as own controls over time, needs no-anticipation/persistence assumptions, washout periods)
    • Waitlist designs / Stepped-wedge (Everyone gets treated eventually, randomisation is timing)
  • Potential outcomes notation needs expansion to account for others’ treatment status (e.g., \(Y_{i}(Z_i, Z_{-i})\) or simplified versions like \(Y_{ab}\))
  • Estimation often involves inverse probability weighting (IPW) when assignment probabilities vary or are complex due to design

Today’s plan 📅

Mediation: Unpacking the Black Box

  • What is mediation analysis? The search for causal mechanisms
  • The classic regression-based approach (and why it’s often problematic)
  • Thinking about mediation with potential outcomes
  • The challenge of complex potential outcomes
  • Can we experimentally manipulate mediators? (Encouragement designs & excludability)
  • A more pragmatic approach: Implicit mediation analysis
    • Adding/subtracting treatment components
  • Examples: Conditional cash transfers, voter turnout mailers
  • Manipulation checks
  • Strengths and limitations of different approaches
  • Moving beyond simple ATEs to how effects happen

What is Mediation? 🤔

The Causal Chain: Z → M → Y

  • Experiments often tell us that a treatment (Z) affects an outcome (Y)
  • Mediation analysis asks how or why this effect occurs
  • It investigates the role of intermediate variables (mediators, M) that lie on the causal path between Z and Y
  • The core idea: Z causes M, and M causes Y
  • Examples:
    • Limes (Z) → Vitamin C intake (M) → Reduced scurvy (Y)
    • Reserving council seats for women (Z) → Female incumbency/Changed attitudes (M) → Election of women later (Y) (Bhavnani 2009)
  • Goal: Identify the pathways through which Z transmits its influence to Y

Why Care About Mechanisms?

  • Scientific Understanding: Moves beyond “black box” descriptions to explanatory theories. How does the world work?
  • Designing Better Interventions: If we know why something works, we might find more efficient or potent ways to achieve the same outcome (e.g., Vitamin C tablets instead of limes)
  • Refining Theory: Tests specific theoretical propositions about causal processes
  • Generalisability: Understanding the mechanism helps predict if an effect will hold in different contexts where the mediator might operate differently
  • Audiences and policymakers almost always ask about mechanisms!

The Traditional Approach: Regression 📈

The Three-Equation System

  • A very common approach (Baron & Kenny style) uses OLS regression:
    1. Regress the mediator (M) on the treatment (Z): \(M_i = \alpha_1 + a Z_i + e_{1i}\) Effect of Z on M: \(\hat{a}\)
    2. Regress the outcome (Y) on the treatment (Z): \(Y_i = \alpha_2 + c Z_i + e_{2i}\) Total effect of Z on Y: \(\hat{c}\)
    3. Regress the outcome (Y) on both the treatment (Z) and the mediator (M): \(Y_i = \alpha_3 + d Z_i + b M_i + e_{3i}\) Effect of M on Y: \(\hat{b}\); Direct effect of Z on Y: \(\hat{d}\)
  • Z is randomly assigned, but M is not. M is an outcome of Z.

Decomposing Effects (Under Strong Assumptions)

  • If we assume constant treatment effects (a, b, c, d are the same for everyone) and certain other conditions hold…
  • The total effect (c) can be decomposed:
    • Total Effect (c): How much Y changes for a one-unit change in Z. Estimated from Eq. (2).
    • Direct Effect (d): How much Y changes for a one-unit change in Z, holding M constant. Estimated from Eq. (3).
    • Indirect Effect (ab): How much of Z’s effect on Y is transmitted through M. Calculated as \(\hat{a} \times \hat{b}\) (or \(\hat{c} - \hat{d}\)).
  • This decomposition (\(c = d + ab\)) is the cornerstone of traditional mediation analysis.
  • BUT: This relies heavily on the constant effects assumption.

The Constant Effects Assumption Revisited

  • Remember from heterogeneous effects: causal effects often vary!
  • If \(a_i\) and \(b_i\) vary across individuals, then the average indirect effect is \(E[a_i b_i]\)
  • \(E[a_i b_i] = E[a_i] E[b_i] + Cov(a_i, b_i)\)
  • Simply multiplying the average effect of Z on M (\(\hat{a} \approx E[a_i]\)) by the average effect of M on Y (\(\hat{b} \approx E[b_i]\)) only gives the average indirect effect if Cov(aᵢ, bᵢ) = 0.
  • This covariance term is generally not zero and not identifiable.
  • So, the simple \(c = d + ab\) decomposition breaks down with heterogeneous effects.

The Problem with Equation (3): Endogeneity of M

  • Even if we ignore heterogeneous effects for a moment… Equation (3) is problematic: \(Y_i = \alpha_3 + d Z_i + b M_i + e_{3i}\)
  • Z is random, but M is an outcome variable. It’s very likely correlated with the error term \(e_{3i}\).
  • Why? Unobserved confounders might affect both M and Y.
  • Example (Bhavnani): Unmeasured local ‘egalitarianism’ (\(e_{3i}\)) might encourage women to run for office (M) and make voters more likely to elect women (Y), independently of the reservation policy (Z).
  • Including M in the regression is like including a post-treatment variable that is correlated with the error term – this leads to biased estimates of both b and d.

The Bias Problem in Estimating ‘b’ and ‘d’

  • If \(M_i\) is correlated with \(e_{3i}\) (because of unobserved confounders like \(e_{1i}\) affecting both), then OLS estimates from Eq (3) are biased.
  • Let \(b\) and \(d\) be the true parameters. The estimates converge to:
    • \(\hat{b}_{N \to \infty} = b + \frac{Cov(e_{1i}, e_{3i})}{Var(e_{1i})}\) (assuming \(M_i = a_1 + aZ_i + e_{1i}\))
    • \(\hat{d}_{N \to \infty} = d - a \times \text{Bias in } \hat{b}\)
  • If Cov(e₁ᵢ, e₃ᵢ) ≠ 0, then \(\hat{b}\) is biased.
  • Consequently, \(\hat{d}\) is also biased (unless a=0, meaning Z doesn’t affect M).

Direction of Bias & Interpretation Issues

  • Often, the unobserved factors affecting M and Y are positively correlated (e.g., motivation, ability, favourable environment). So, \(Cov(e_{1i}, e_{3i}) > 0\).
  • This means regression tends to:
    1. Overestimate the effect of the mediator (b). M looks more important than it is.
    2. Underestimate the direct effect of the treatment (d). Z looks like it only works through M.
  • The net result: The mediation analysis often looks “successful” – the mediator seems crucial, and the direct path seems negligible – precisely because of the estimation bias!
  • Controlling for other covariates doesn’t solve this fundamental endogeneity problem unless they perfectly capture all confounding between M and Y.

Regression Approach Summary

  • Relies on strong, often implausible assumptions:
    • Constant treatment effects (for \(c = d + ab\))
    • No unobserved confounding between M and Y (for unbiased \(\hat{b}\) and \(\hat{d}\))
  • These assumptions are not guaranteed by random assignment of Z alone.
  • Prone to bias that exaggerates the mediator’s role and downplays the treatment’s direct effect.
  • Use with extreme caution, if at all, for causal mediation claims based solely on Z randomisation.

A Potential Outcomes View

Defining Potential Outcomes for Mediation

  • Let’s apply our familiar framework. For each individual \(i\):
    • \(Z_i\): Randomly assigned treatment (0 or 1)
    • \(M_i(z)\): Potential value of the mediator if assigned to treatment \(z\)
    • \(Y_i(z)\): Potential value of the outcome if assigned to treatment \(z\)
  • Observed variables:
    • \(M_i = M_i(Z_i)\)
    • \(Y_i = Y_i(Z_i)\)
  • Average Treatment Effect (Total Effect): \(ATE = E[Y_i(1) - Y_i(0)]\)
  • Effect of Z on M: \(E[M_i(1) - M_i(0)]\)
  • These are estimable from a standard experiment randomising Z.

Example Where Regression Fails (Table 10.1 Logic)

  • Consider a scenario (like Table 10.1 in the text):
    • Suppose Z truly affects both M and Y directly.
    • Let the true effect of M on Y be zero (\(b=0\)).
    • Let the true direct effect of Z on Y be one (\(d=1\)).
    • Let the true effect of Z on M be one (\(a=1\)).
    • Total effect \(c = 1\).
    • Suppose there are unobserved factors (\(e_{1i}, e_{3i}\)) that are correlated and influence M and Y.
  • What happens with regression?
    • Eq (1) correctly estimates \(\hat{a} \approx 1\).
    • Eq (2) correctly estimates \(\hat{c} \approx 1\).
    • BUT: Eq (3) gives biased results due to the correlation between M (affected by \(e_{1i}\)) and \(e_{3i}\). It might estimate \(\hat{b} \approx 1\) and \(\hat{d} \approx 0\).
  • Regression falsely suggests M mediates the entire effect, and Z has no direct effect! This matches the bias direction discussed earlier.

Expanding Potential Outcomes: Yᵢ(m, z)

  • To formally define direct and indirect effects, we need a more complex notation (Imai, Keele, Yamamoto 2010; Robins & Greenland 1992):
  • \(Y_i(m, z)\): Potential outcome for individual \(i\) if their mediator were set to value \(m\) AND they were assigned treatment \(z\).
  • This allows us to think about hypothetical scenarios:
    • What would \(Y\) be if we gave the treatment (\(z=1\)) but somehow forced the mediator to the level it would have taken under control (\(m=M_i(0)\))? This would be \(Y_i(M_i(0), 1)\).
  • Linking back: \(Y_i(1) = Y_i(M_i(1), 1)\) and \(Y_i(0) = Y_i(M_i(0), 0)\).

Defining Effects with Y(m, z)

  • Using this notation, we can define effects more precisely:
  • Total Effect (ATE): \(E[Y_i(1) - Y_i(0)] = E[Y_i(M_i(1), 1) - Y_i(M_i(0), 0)]\)
  • Controlled Direct Effect (CDE): Effect of Z on Y, holding M fixed at some level \(m\).
    • \(CDE(m) = E[Y_i(m, 1) - Y_i(m, 0)]\)
  • Natural Direct Effect (NDE): Effect of Z on Y if M were held at the level it would have taken under control.
    • \(NDE = E[Y_i(M_i(0), 1) - Y_i(M_i(0), 0)]\)
  • Natural Indirect Effect (NIE): Effect of M changing from \(M_i(0)\) to \(M_i(1)\), holding Z fixed (usually at Z=1 for policy relevance).
    • \(NIE = E[Y_i(M_i(1), 1) - Y_i(M_i(0), 1)]\)
  • Under certain assumptions, \(ATE = NDE + NIE\).

The Challenge: Complex Potential Outcomes

  • Look closely at the definitions of NDE and NIE:
    • NDE involves \(Y_i(M_i(0), 1)\)
    • NIE involves \(Y_i(M_i(0), 1)\)
  • These are “complex” or “counterfactual” potential outcomes.
    • \(Y_i(M_i(0), 1)\) represents the outcome under treatment (Z=1), but with the mediator at the level it would take under control (M=M(0)).
  • This state can never happen in reality! If \(Z_i=1\), we observe \(M_i(1)\), not \(M_i(0)\).
  • These potential outcomes are fundamentally unobservable from any single experiment just randomising Z. (See Box 10.2)

Fundamental Problem Recap

  • Estimating the theoretically precise Natural Direct Effect (NDE) and Natural Indirect Effect (NIE) requires knowing the values of complex potential outcomes like \(Y_i(M_i(0), 1)\).
  • Since these are unobservable, we cannot estimate NDE and NIE without making strong assumptions.
  • Common assumptions include versions of “sequential ignorability” (Imai et al.), essentially assuming M is “as-if randomised” conditional on Z and pre-treatment covariates – very similar to the assumption needed for the regression approach to be unbiased.
  • Just randomising Z is not sufficient to identify mediation pathways without these extra assumptions.

Can We Address This? 🤔

Ruling Out Mediators (A Modest Goal)

  • What if we can show that Z has no effect on M?
  • If \(M_i(1) = M_i(0)\) for all individuals \(i\) (the sharp null hypothesis), then M cannot possibly mediate the effect of Z.
  • In this case, the complex potential outcomes simplify: \(Y_i(M_i(0), 1) = Y_i(M_i(1), 1) = Y_i(1)\). The NIE becomes zero.
  • How to test this?
    • Estimate the average effect: \(E[M_i(1) - M_i(0)]\). If it’s precisely zero…
    • Test for heterogeneity: Check if \(Var(M_i(1))\) differs from \(Var(M_i(0))\). (As discussed in Lecture 18). If variances are similar and ATE is zero, it lends support to the sharp null.
  • Ruling out mediators is easier than quantifying mediation. Useful for eliminating hypotheses.

Manipulating the Mediator Directly?

  • What if we design an experiment that randomises both Z and M? (A factorial design!)
  • Example: 2x2 design
    • Group 1: Z=0, M=0 (Control)
    • Group 2: Z=0, M=1 (Manipulate M only)
    • Group 3: Z=1, M=0 (Manipulate Z, block M?)
    • Group 4: Z=1, M=1 (Manipulate Z and M)
  • This helps estimate Controlled Direct Effects (CDEs). E.g., Effect of Z holding M at 0 is (Group 3 - Group 1). Effect of M holding Z at 1 is (Group 4 - Group 3).
  • BUT: This still doesn’t directly estimate the NDE or NIE, because M is set experimentally, not allowed to take its “natural” value \(M_i(z)\).
  • Also, manipulating M might be hard or artificial (e.g., giving Vitamin C tablets vs. limes might have different effects beyond Vitamin C).

The Encouragement Design Analogy

  • Often, we can’t directly set M. Instead, we use an “encouragement” Z to try and influence M.
  • Think back to non-compliance and Instrumental Variables (IV):
    • Z = Assignment/Encouragement (e.g., offer tutoring)
    • M = Treatment Received (e.g., actually attend tutoring)
    • Y = Outcome (e.g., test score)
  • We can use Z as an instrument for M to estimate the effect of M on Y for Compliers (CACE/LATE).
  • This is sometimes framed as a mediation analysis: Z affects Y through its effect on M.
  • Formula: \(CACE_{M \to Y} = \frac{ITT_{Y}}{ITT_{M}} = \frac{E[Y|Z=1] - E[Y|Z=0]}{E[M|Z=1] - E[M|Z=0]}\)

The Excludability Problem for IV/Mediation

  • For the IV estimate of the effect of M on Y to be valid, we need the exclusion restriction.
  • In the mediation context, this means Z must affect Y only through M. There can be no direct path from Z to Y (\(d=0\)).
  • This is a very strong assumption! Often, the encouragement (Z) might affect the outcome (Y) through other channels besides the intended mediator (M).
  • Example (Bhavnani): Seat reservations (Z) might affect future elections (Y) not just by creating incumbents (M1), but also by changing voter attitudes (M2) or mobilising different voters (M3).
  • If Z has a direct effect or affects multiple mediators, the simple IV approach doesn’t isolate the effect through one specific M.
  • Identifying effects through multiple mediators requires multiple encouragements (instruments) with different effects on the mediators – very complex!

A Different Approach: Implicit Mediation 💡

Scaling Back Ambitions: Focus on Treatment Components

  • Given the challenges of formally estimating direct/indirect effects or using IV with strong assumptions…
  • An alternative: Implicit Mediation Analysis.
  • Instead of trying to measure M and model the Z → M → Y pathway explicitly…
  • Focus on the treatment Z itself. Many treatments are “bundles” of different components.
  • Design experiments that add or subtract specific components of the treatment bundle.
  • Compare the effects of these different treatment variations.
  • This implicitly tests the importance of the components (and the mediators they likely affect) without needing to measure M or make strong assumptions about unobservables.

Example: Conditional Cash Transfers (CCTs)

  • CCT programmes give cash to poor families if they meet certain conditions (e.g., school attendance, health check-ups).
  • Potential mediators: Increased income (cash effect), Changed behaviour due to rules (conditionality effect).
  • Implicit Mediation Design: (e.g., Baird et al. 2009)
    • Group 1: Control (no programme)
    • Group 2: Unconditional Cash Transfer (UCT - gets cash, no rules)
    • Group 3: Conditional Cash Transfer (CCT - gets cash + rules)
  • Comparisons:
    • (Group 2 - Group 1): Effect of cash alone.
    • (Group 3 - Group 1): Effect of cash + conditions.
    • (Group 3 - Group 2): Effect of conditions (holding cash constant).
  • This tells us about the importance of the conditions (likely mediator: behaviour change) vs. the cash (likely mediator: income) without directly measuring parental behaviour or income changes and modelling them.

Example: Voter Turnout Postcards (Gerber, Green, Larimer 2008)

  • Famous experiment testing social pressure effects on voting.
  • Treatment “ingredients” gradually added:
    • Group 1: Control (no mail)
    • Group 2: “Civic Duty” mailer (basic encouragement)
    • Group 3: “Hawthorne” mailer (Civic Duty + told they are being studied)
    • Group 4: “Self” mailer (Hawthorne + shown own household’s past voting record)
    • Group 5: “Neighbors” mailer (Self + shown neighbours’ past voting records)
  • Implicit mediators: Sense of duty, being watched, accountability for own record, comparison to neighbours.

Voter Turnout Results (Table 10.2)

Illustrative Turnout Rates:

Group Treatment Components Turnout Effect vs Control
1. Control None 29.7%
2. Civic Duty Encouragement 31.5% +1.8%
3. Hawthorne Encouragement + Monitoring 32.2% +2.5%
4. Self Encouragement + Monitoring + Own Record 34.5% +4.8%
5. Neighbors Encouragement + Monitoring + All Records 37.8% +8.1%

(Based on Gerber, Green, and Larimer 2008)

  • Clear pattern: Adding social pressure components significantly increases turnout.
  • Comparing Group 5 vs 4 suggests disclosing neighbours’ records adds ~3.3% effect.
  • Comparing Group 4 vs 3 suggests disclosing own record adds ~2.3% effect.
  • Tells us which ingredients matter without modelling psychological states.

Interpreting Implicit Mediation

  • This approach identifies which aspects of a complex intervention drive the effect.
  • It suggests the likely importance of different mediating processes (e.g., social comparison is powerful for turnout) without needing to directly measure or model the mediator variable (e.g., feelings of shame/pride).
  • It’s inherently design-based and stays within the clean framework of comparing randomly assigned groups.
  • Very useful for programme evaluation and refinement. What are the active ingredients?

Manipulation Checks

  • Even with implicit mediation, it’s often useful to include manipulation checks (Box 10.3).
  • These are measures of the intended intermediate effects of the treatment components.
  • Example (CCT): Measure school enrolment rates. We expect CCT and UCT to affect income, but CCT to affect enrolment more directly due to conditions. Checking if enrolment differs between CCT and UCT groups helps confirm the conditionality mechanism is plausible.
  • Example (Turnout): Ask recipients if they felt monitored or compared themselves to neighbours (though self-reports can be tricky).
  • Manipulation checks verify that the treatment components likely engaged the intended intermediate process, even if we don’t use that measure as a mediator in a formal model.

Strengths of Implicit Mediation

  • Avoids bias inherent in regression-based mediation with non-randomised mediators.
  • Stays within the unbiased statistical framework of comparing randomised groups.
  • Lends itself to exploration and discovery of effective treatment variations.
  • Relies on weaker, design-based assumptions rather than statistical assumptions about unobservables.
  • Often more practical and feasible than trying to perfectly manipulate or measure mediators in field settings.

Conclusion

Key Takeaways on Mediation

  • Mediation analysis seeks to understand how a treatment Z affects an outcome Y through an intermediate variable M (causal mechanisms).
  • Traditional regression approaches are widespread but problematic:
    • Rely on constant effects assumption.
    • Suffer from bias due to unobserved confounding between M and Y.
    • Tend to overestimate M’s role and underestimate Z’s direct role.
  • The potential outcomes framework reveals fundamental challenges:
    • Estimating natural direct/indirect effects requires observing complex potential outcomes that are inherently unobservable.
    • Randomising Z alone is insufficient; strong assumptions are needed.
  • Experimentally manipulating M helps estimate controlled effects but doesn’t fully solve the problem and may be artificial.
  • Using Z as an instrument for M requires the strong exclusion restriction (no direct Z->Y path), often violated.
  • Ruling out mediators (showing Z doesn’t affect M) is a more modest but achievable goal.
  • Implicit mediation analysis is a pragmatic, design-based alternative:
    • Varies treatment components experimentally.
    • Compares effects to infer which components (and likely mechanisms) matter.
    • Avoids modelling M directly, relies on fewer assumptions.

Final Thoughts

  • Be critical of causal mediation claims, especially those based solely on standard regression methods after randomising only Z.
  • Ask about the assumptions being made (constant effects? no confounding? exclusion restriction?).
  • Favour design-based approaches where possible.
  • Implicit mediation offers a robust way to gain insights into mechanisms by comparing different versions of a treatment.
  • Understanding mechanisms is crucial, but requires careful thought about identification strategies!

Thanks very much! 😊

See you next time! 👋